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Abstract 

Besides the Hidden Subgroup Problem, the second large class of quantum speed-ups is for 
functions with constant-sized 1-certificates. This includes the OR function, solvable by the 
Grover algorithm, the distinctness, the triangle and other problems. The usual way to solve 
them is by quantum walk on the Johnson graph. 

We propose a solution for the same problems using span programs. The span program is a 
computational model equivalent to the quantum query algorithm in its strength, and yet very 
different in its outfit. 

We prove the power of our approach by designing a quantum algorithm for the triangle 
problem with query complexity 0{n'^^^^'') that is better than 0(n^^''^'^) of the best previously 
known algorithm by Magniez et al. 

1 Introduction 

In this paper, we are interested in quantum query complexity of functions with 1-certificate com- 
plexity bounded by a constant. Research on quantum algorithms for such functions was launched 
shortly after the beginnings of quantum computation. The first example is that of Grover search [7] 
for the OR function. The distinctness and the triangle problems also belong to this class. 

One can distinguish two main design paradigms for quantum algorithms for such functions. 
The first one includes application of the Grover search and its close relative — quantum amplitude 
amplification. This paradigm resulted in the algorithm for the collision problem with complexity 
0(n^/'^) by Brassard et al. [3], the 0(?T.^/'*)-algorithm for the distinctness problem by Buhrman et 
al. [3], the 0(n^°/^ )-algorithm for the triangle problem by Magniez et al. [8J, and others. 

The second paradigm is based on quantum walks on the Johnson graph. It was pioneered 
by Ambainis with his 0(n^/^ )-algorithm for the distinctness problem jlj. The triangle- finding 
algorithm with complexity 0{n}^^^'^) by Magniez et al. [8] also belongs to this class. 

In this paper, we propose an approach to these problems using span programs. The span 
program is a computational model proven by Reichardt to be equivalent to the quantum query 
algorithm [TU1[TT]. Despite this equivalence, the actual applications of this model have been limited, 
mostly, to formulae evaluation [T^ . 

We show that span programs are useful for other well-studied problems in quantum compu- 
tation. We build analogues of the algorithms for the OR function and the distinctness problem 
with optimal complexity. We demonstrate the power of our approach by designing an algorithm 
for the triangle problem with complexity 0(rt'^^/^''), that is better than 0{in}^/^'^) of the algorithm 
by Magniez et al. 

The paper is organized as follows. In Section [51 we review the notion of certificate complexity, 
define the problems being solved, and describe the span programs. In Section [3l we define learning 
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graphs as our approach to functions with small 1-certificate complexity. In Section m we introduce 
the concept of stages that is illustrated by an example of a learning graph for the distinctness 
problem. In Section [SJ we discuss how symmetry of the problem can be used in the design of 
learning graphs, and finally, in Section |51 we give our algorithm for the triangle problem. 

2 Preliminaries 

2.1 Quantum Query Algorithms for Functions with Constant-Sized 1- 
certificates 

For the basic concepts of quantum algorithms, a reader may refer to [5]. We are interested in 
query complexity of quantum algorithms, i. e., we measure the complexity of a problem by the 
number of queries to the input the best algorithm should make. Clearly, query complexity provides 
a lower bound on time complexity. For many algorithms, query complexity can be analysed easier 
than time complexity. For the definition of query complexity and its basic properties, as well as 
properties of certificate complexity, a good reference is [S]. 

Functions we work with in this paper have bounded 1-certificate complexity. Let us define 
what this means. Consider a multivariable function / : [m]" — ^ [2]. By [m], we denote the set 
{0,1,...,TO — 1}. We identify the set of input variables of / with set [n]. An assignment is a 
function a : [n] Z) S ^ [to]. One should think of this function as fixing values for some input 
variables. We say input x = {xi}i^[n] agrees with assignment a if a{i) = Xi for all i (Z S. The size 
of an assignment is the size of its domain 5*. 

Assignment a is called a b-certificate for / if any input consistent with a is mapped to b by /. 
The certificate complexity Cx{f) of function / on input x is defined as the minimal size of a certifi- 
cate for / that agrees with x. The 6-certificate complexity C^''\f) is defined as maxj-gy-ij-f,) Cx{f)- 

As it has been said, we are interested in algorithms for families of functions such that C'-^' (/) 
remains bounded by a constant. Many quantum algorithms have been constructed for functions 
from this class. We work mostly with the following three functions. 

OR function The simplest example of a function with constant 1-certificate complexity is the 
OR function. It is easy to see that C*^^^(Oi?) = 1, since it is enough to pick any input variable 
equal to 1. Note that 0-certificate complexity of this function is n. 

The quantum algorithm for the OR function is one of the first quantum algorithms. It was 
invented by Grover [7j and it has query complexity 0{^/n). This is optimal (6j. 

Distinctness Problem The distinctness function is function / : [m]" — )■ [2] such that f{xi , . . . ,Xn) 
equals 1 iff there are equal elements among {xi, . . . ,x„}. It has 1-certificate complexity 2. 

The first quantum algorithm for the distinctness problem had complexity 0{n^^^) and was due 
to Buhrman et al. [4*. This was later improved to 0(?t,^/^) by Ambainis [1 . This is the first 
natural problem solved by an algorithm based on a quantum walk. This algorithm is optimal, due 
to the result by Shi [IB] . 

Triangle Problem Consider a full graph on n vertices. The input variables of the function are 
in correspondence to the edges of the graph. Denote by Xij where 1 < i < j <^ n the input variable 
corresponding to the edge joining vertices i and j. The task is to detect whether there is a triangle 
with all edges marked by 1, i. e., whether there are indices i < j < k such that Xij — Xik ~ Xjk = 1. 
Clearly, the 1-certificate complexity of the triangle problem is 3. 
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The best previously known quantum query algorithm for the triangle problem is due to Magniez 
et al. [5] and has complexity 0(n^'^/^°). We describe this algorithm in Section [51 In the same 
section, we improve the complexity to 0(n'^^/^'^). The best known lower bound is just n{n). 

2.2 Span Programs 

In this section, we define span programs following, mostly, flOJ. A span program P is a way of 
computing a Boolean function {0, 1}™ — > {0, 1}. It is defined by 

• A finite-dimensional inner product space V — M". Reichardt et al. define span programs 
over C, we find real span programs more convenient. Real span programs are known to be 
equivalent to the complex ones [TDl Lemma 4.11]; 

• A non-zero target vector t £ V; 

• A set of input vectors I d V. The set / is split into the union of the set of free input vectors 
/free and the collection of sets {Ij^b} with j = 1, . . . ,m and 6 = 0, 1: / = /free U IJ^ j, Ij^t- The 
input vectors of Ij^b are labeled by the tuple of the j-th input variable xj and its possible 
value b. 

For each input x = (xj) G {0, 1}™, define the set of available input vector as I{x) — /ftoc U 
[SjLi Ij,xj- Its complement I\I{x) is called the set oi false input vectors. We say that V evaluates 
to 1 on input x, iS t £ span(/(a;)). In this way, span programs define total Boolean functions. One 
can define a span programs for a partial Boolean function as well, by ignoring the output of the 
program on the complement of the domain. 

A useful notion of complexity for a span program is that of witness size. Assume, up to the 
end of the section, a span program V calculates a partial Boolean function /:!?—> {0, 1} with 
^ {0, 1}™. Let A and A{x) be matrices having / and I{x) as their columns, respectively. 

If V evaluates to 1 on input x & T), a. witness for this input is any vector w G RI^^^^I such that 
A{x)w = t. The size of w is defined as its norm squared Hwlp. 

If, on contrary, f{x) — then a witness for this input is any vector w' £ V such that {w' , t) = 1 
and that is orthogonal to all vectors from I{x). Since t ^ span(/(x)), such a vector exists. The 
size of w' is defined as Note that this equals the sum of squares of inner products of w' 

with all false input vectors. 

The witness size wsize(7^, x) of span program V on input x is defined as the minimal size among 
all witnesses for x in 7^. We also use notation 

wsizBhiV .V) ~ max wsize(P,a;). 

x£V:f(x)=b 

The witness size of V is defined as 

wsize(7',I?) — \/ wsizco ('P , 2?) wsizei (T' , 2?) . 

This is not a standard definition, but it appears as equation (2.8) in |10j . 

The following important theorem is a combination of results from flJJ and |10j and it shows 
why span programs are important for quantum computation: 

Theorem 1. For any partial Boolean function f: {0, 1}" D 2? — > {0, 1} and for any span pro- 
gram V computing f , there exists a 2- sided bounded error quantum algorithm calculating f in 
0(wsize(7', 2?)) queries. 

Thus, a search for a good quantum query algorithm is essentially equivalent to a search for a 
span program with small witness size. 
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3 Learning graphs 



3.1 Definitions 

Our main model of computation for this paper is the learning graph, or just L-graph. It is a 
directed acyclic connected graph with vertices being subsets of the set of input variables. Usually, 
we identify the latter with [n], where n is the number of input variables. Sometimes, we call the 
vertices of the learning graph L-vertices. 

One may think of the learning graph as simulating the development of our knowledge on the 
input. Initially, we know nothing on the input, and it is represented by vertex 0. When in vertex 
S C [n], the values of the variables in S have been learned. For any j e [n] \ S, vertex S can be 
connected to U {j} by an arc. This can be interpreted as querying the value of variable xj. We 
say the arc loads element j. When talking about vertex S, we call S the set of loaded elements. 

Each arc e is assigned a positive real number We - its weight. A learning graph is similar to a 
randomized decision tree with some differences. First of all, it is not a tree. Secondly, the values 
of the input variables do not figure in the model (see, however. Remark!?]). And finally, there is 
no restriction on the weights of the arcs. 

In order for a learning graph to calculate function / correctly, the following property should 
be assured. For any x £ there exists a 1-certificate for x contained in a vertex of the 

learning graph. We call such vertices accepting. For any correct learning graph, one can define its 
complexity as the geometrical mean of its positive and negative complexities. 

Let E be the set of arcs. The negative complexity of the learning graph is defined as J2eeE 
The positive complexity is more subtle. Fix an input x G /^^(l) and consider a flow on the 
learning graph such that 

• vertex is the only source of the flow, and it has intensity 1. In other words, the sum of pe 
over all e's leaving is 1; 

• vertex S" is a sink iff it is accepting. That is, if 5 7^ and S does not contain a 1-certificate 
for X then, for vertex the sum of Pe over all in-coming arcs equals the sum of Pe over all 
out-going arcs. 

The complexity of the flow is deflned as X^eGB^e/^e- The complexity for input x is the minimum 
complexity over all possible flows satisfying these conditions. The positive complexity of the 
learning graph is the maximum complexity over all x's such that f{x) = 1. 

We will also talk about flow p„ through a vertex v. It is defined as the sum of pe over all arcs 
ending at v. 

Remark 2. Under a reasonable assumption Pe > 0, one can consider the above flow as a random 
walk. Indeed, consider the probability distribution on paths starting at and finishing in an 
accepting vertex, such that the probability an arc e is used in the path is exactly p^. In contrary 
to random walks used previously to build quantum walks, this is rather a random walk through 
the graph than on it. We utilize this probability language in Section |6l 

For most of our applications, we consider only one certificate a for each input x E /^^(l). We 
call the elements inside the domain of a marked. Then, the only task of the learning graph is to 
load all the marked elements. We construct learning graphs in order to minimize the complexity 
of this loading. 

The following theorem links learning graphs and quantum query complexity. 

Theorem 3. For any learning graph for a function f : [m]" — > {0, 1} with complexity C, there 
exits a bounded error quantum query algorithm for the same function with complexity 0(C log m). 

The theorem is proven in Section [3. 3[ but before that we give a warm-up example. 
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3.2 Grover Search 



We start with the description of a learning graph corresponding to the Grover algorithm. Recall, 
it calculates the OR function. An assignment Xj i— > 1 is a 1-certificate for every j . 

The learning graph is quite simple. It has vertices and {!},..., {n}. Vertex is connected 
by an arc of weight 1 to each of {i}'s. 

Clearly, the negative complexity is n. Let us calculate the positive complexity. Let i be such 
that Xi = 1. Define the flow equal to 1 on the arc from to {i} and for all other arcs. This 
gives the positive cost 1. Hence, the complexity of the learning graph is ^/n that coincides with 
the complexity of the Grover algorithm. 

As well-known, if it is promised that there is either none of x^'s equal to 1, or at least r of 
them, the complexity of the Grover algorithm becomes 0{y/n/r). This can be shown using the 
same learning graph. Let M be the set of input variables equal to 1. Define the flow as 
along an arc to a vertex containing an element of M, and 0, otherwise. This gives the positive 
complexity |M|p|-p- = 1/|A/|. Hence, the total complexity is \fnjr. 

This illustrates the main point about positive complexity. We want to distribute the flow as 
evenly as possible along as many paths as possible. Doing so reduces the complexity because of 
convexity of the square function. 

3.3 Proof of Theorem O 

We aim to apply Theorem [U i. e., to build a span program and estimate its witness size. Let us 
start with the Boolean case m — 2. 

Description of the Span Program Let us describe the vector space of the span program. 
Each vertex S of the learning graph is represented by 2^^^ vectors {tcr} where a is an element 
of [2]"^. We assume all this vectors are orthonormal. One may think of as representing the 
values learned while querying elements of 5, while vertex S represents the sole fact the variables 
have been queried. Vector ^0, that corresponds to vertex 0, is the target of the span program. If 
cr : 5 —7- [2] is a 1-certificate for /, tcr is a free input vector. 

Consider an arc e from S to S U {j} with weight We- For each vector such that a has domain 
S, we add two input vectors 

Vwl{ta - tau{3^b}), 6 = 0,1. (1) 

Here aU {j t-^ b} is the assignment with domain SU {j} that maps i to a{i) for i d S and maps j 
to b. Each of these two vectors is labeled by value b of variable xj . 

Negative Witness Size Let us describe the negative witness w' of the span program. Fix an 
input cc e /^^(O). For each S, define l{S) as the only assignment S — >■ [2] agreeing with the input. 
For each t„, we define {w' ,1^) = 1 if ct agrees with the input. Otherwise, we define {w',ta) = 0. 

Consider a free input vector of the form to-. Since f{x) — 0, and cr is a 1-certificate, a does not 
agree with the input. By the construction, is orthogonal to the witness. 

Consider an available input vector of the form ^ . There are two cases 

1. Inner product {w' ,t„) equals 0. In this case a does not agree with the input, and, a fortiori, 
none of a U {j i— > 0} and ct U {j i— > 1} agrees with the input. Hence, both vectors of ([l} are 
orthogonal to the witness. 

2. Inner product {w\t^) equals 1. In this case {w' ,tcrij{j^b}) = 1 if = ^■ 
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In both cases, the available input vector of ((T|) is orthogonal to the witness. This proves that w' 
indeed is a negative witness. 

Let us calculate the size of w' . Let e be an arc from 5 to 5 U {j}. We claim there is exactly one 
input vector that arises from e and isn't orthogonal to the witness. Let a have domain S. By the 
first point above, if a does not agree with the input, both input vectors of ([1]) are orthogonal to 
w' . If cr = l{S), the inner product of the false input vector from ([l} and w' is ^/w^. By summation 
over all arcs, we have the size of w' equal to Wg, i. e., to the negative complexity of the learning 
graph. 

Positive Witness Size Now, let us calculate the positive witness size. Fix an input x such that 
f{x) = 1, and let pe be the corresponding flow. We will give a linear combination of the available 
input vectors that equals i©. 

Let e be an arc from 5 to U {j} with weight We- Let a = i{S) and take the available 
input vector from ([1]) with coefficient pe/^/wZ. Multiplied by the coefficient, the vector equals 

Pe{tL{S) - *t(SU{j}))- 

Suppose vector 5 be a sink. Then, ^^(5) is a free input vector. Take it with the coefficient equal 
to the difference of the in-flow to S and the out-flow of S. 

By the properties of the flow, the sum of all these vectors equals ^0 that is the target vector. 
The witness size is X^e^^e/^e, i- e., the positive complexity of the learning graph. This proves the 
theorem for the Boolean case. 

Non-Boolean Case Now consider the case m > 2. Fix a representation of elements of [m] using 
k = [log™] bits. For j £ [n], construct a set Bj of k Boolean variables representing Xj. Consider 
Boolean function /' : [2]^^^^ — !• [2] obtained from / by encoding the inputs variables. One can 
construct a learning graph G' for /' from the learning graph G for / in the following way. Replace 
each vertex S* by S" = Uj^sBj. For an arc from S* to S'U {j} with weight We, fix an arbitrary order 
yi, . . . ,yk of elements in Bj and represent the arc as path 5', S' U {yi}, . . . , S" U {yi, . . . , yk} in G" 
of k arcs, each of weight We- 

Clearly, the negative complexity of G" is k times the negative complexity of G. Accepting 
vertices of G are transformed into accepting vertices of G", and each flow through G can be 
transformed into a flow through G' in an obvious way. This increases the complexity of the flow 
k times. Hence, the complexity of G' is at most k times the complexity of G, and, for G' , we can 
apply the construction from the first part of the proof. 

3.4 Additional Remarks 

The construction used in the proof of the non-Boolean case of Theorem [3] is a special case of 
multiplexor from p]. The main reason behind the appearance of the logm factor in the witness 
size of the span program is the representation of an m-ary variable by a set of log m Boolean 
variables. It is tempting to claim that this factor can be removed, if one allows queries to the m- 
ary variable directly, as it usually is done in quantum query algorithms for non-Boolean functions. 
Unfortunately, we do not know yet how to use such queries in span programs. 

Often, It will be convenient to use more than one vertex with the same subset S in the learning 
graph. We will distinguish them using some additional labels. Theorem |3] also holds for such 
learning graphs. One may prove that by noticing that the proof of the theorem does not change if 
one adds additional labels to the vertices. Another, more illuminating reasoning is as follows. 

Assume we have a vertex 5* in a learning graph G with arcs ei,...,efc going to vertices 
(S", 1), . . . , {S', fc), respectively, that represent the same subset of input variables S' — S U {j}. 
Let the weights of the arcs be wi, . . . ,Wk- Replace the k vertices by one vertex S' and k arcs by 
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one arc e connecting S to S' of weight wi + ■ ■ ■ + Wk- Clearly, this does not change the negative 
complexity of the learning graph. 

For the positive complexity, assume we have a flow on G with flow through equal to pi. We 
can construct the corresponding flow on G' by sending flow pi + ■ ■ ■ + Pk through e. The positive 
complexity can decrease only, because 

{Pl+P2 + ---+Pk)^ < ^ + + ^ 
Wi -\ \-Wk ~ Wi Wk 

The last inequality follows from the Jensen's inequality for the square function 

{aixi H h afcXfe)^ < aixf H 1- akxl, 

with ai = ■Wi/{wi + • • • + Wk) and Xi = Pi/ai. 

One can transform a learning graph with additional labels into a learning graph as defined 
in Section 13.11 using the above transformation repeatedly, and the complexity can decrease only. 
Hence, the definition of the learning graph from Section 13.11 is optimal from the point of view of 
complexity. 

Remark 4. In our construction of the learning graph, the weights of the arcs leaving vertex ts do 
not depend on the value of the variables inside S. By analysing the proof of Theorem [31 mostly 
the fact vertex 5* is being split into vectors {ta}, one can see that it goes through in more general 
settings. Namely, one can define weights of the arcs leaving vertex S in dependence on the values 
of the variables inside S. When calculating the positive and the negative complexities, one sums 
up only the arcs that match the input. In this case, the negative complexity also depends on 
the input. We didn't find this model useful for our applications, but it is possible that for other 
problems it could provide some improvements. 

4 Distinctness Problem: Learning by Stages 

In this section, we describe our approach to the learning graphs using stages. We illustrate our 
construction with the example of the distinctness problem. 

Stage is a slice of the learning graph consisting of arcs with similar functionality. Functionality 
is described with respect to the flows used in the definition of the positive complexity. For each 
input X E /^^(l), we select a 1-certificate a. Recall that the elements of the domain of a are called 
marked. E. g., for the distinctness problem, we mark any two elements a and b having the same 
value. 

Different stages represent different relation of the arcs used in the flow to the marked elements. 
For each problem, the number of stages doesn't depend on the size of the instance, i. e., it equals 
0(1). For the distinctness problem, for example, we have three stages as in Tablejl] independently 
on n. Thus, on stage I non-zero flow have only arcs not loading a or b, on stage II — arcs loading 
a, on stage III — b. The stages are written in the order they are used in the flow. I. e., any random 
walk through the learning graph will use arcs of stage I first, then arcs of stage II and so on. 

I. Load r — 2 items different from a and b. 
II. Load a. 
III. Load b. 



Table 1: Stages for the distinctness problem. 

Any stage performs a transition from a subset of L-vertices to another subset of L-vcrticcs. If 
there are k stages, there are k + 1 subsets: denote them by Vq, . . . , Vfc: the i-th stage moves from 
Vi-i to Vi. The subset Vq consists of the initial vertex 0. 
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Now we describe the stages in more detail. Consider the z-th stage. Vertex S G Vi-i may be 
connected by a transition e to vertex S' G Vi only if C S". The length £{e) of the transition is 
defined as |S" \ ^j. We denote the set of transitions of stage ihy Ei. The set of all transitions is 
denoted by E. 

We define a reduced learning graph as a graph on the vertex set Vq U Vi U • • • U Vfc and transitions 
instead of arcs. For an input x G one can define a fiow through the reduced graph in 

the same way as for a learning graph. We select a flow Peix) on the reduced graph for each 
input X G f~^{l). We break the complexity of the whole learning graph into complexities of the 
individual stages. A complexity of stage i is defined by 



Ci — max 

xe/-i(i) 




\eeEi / \ee_E. 



Proposition 5. For a reduced learning graph G with k stages and complexities Ci of each stage, 
one can build a learning graph of complexity 0{Ci + • • • + Cfc). 

Proof. At first, we transform the reduced learning graph into a learning graph. Let e be a transition 
between S and S' with weight w^. Fix an arbitrary ordering of elements of S" \ 5 = {si, . . . , S£(e)}- 
Represent e by path 5, 5 U {si}, S U {si, S2}, . . . , S" \ {s^(e)}, "5" in the learning graph. In order to 
simplify calculations, we assume paths for different transitions do not intersect. In other words, 
we make a unique copy of a vertex for each transition that uses it as an internal vertex. We assign 
weight We to each arc of the path. Similarly, if pe is a fiow through the transition, we set the fiow 
through each arc of the path equal to Pe- 

It is easy to see that the negative and the positive complexities, when we consider only arcs on 
stage i, are, respectively. 



N.^Yl ^(e)^- and P^^Y1 



i{e)pl 



We 



Now divide the weights of arcs on stage i hy Ni. Clearly, the negative complexity of the whole 
learning graph becomes k — 0(1). The complexity of the learning graph becomes 



O 



□ 



We call a transition valid, if it satisfies the condition stated in the description of the stages, 
i.e., if it is used in the fiow. Whether a transition is valid depends on the input x, more precisely, 
on the set of marked elements. For the distinctness problem, a transition on stage II is valid if it 
loads a and originates in a vertex without b. A transition on stage III is valid if it loads element 
b and comes from a vertex containing a. Similarly, we call a vertex valid if it has non-zero flow 
through it. 

At the moment of construction, we do not know which transitions are valid, so we assume all 
possibilities. E. g., for the distinctness problem, subsets Vi, V2 and V3, are vertices of the learning 
graph with r — 2, r — 1 and r elements loaded, respectively, and we add transitions everywhere 
possible. For an illustration refer to Figure [T] 

Let us describe the flow for the distinctness problem. It is already implicitly described in the 
description of the stages. On stage I, we let a flow ("Z2) along any transition ending in a vertex 
with both a and b not loaded. Denote such a vertex by v. Then we forward the flow along the 
transition to w U {a} and then to vU {a, b} on stages II and III, respectively. The last L- vertex is 
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Figure 1: The learning graph for the distinctness problem in case ?i = 5 and r = 4. Stages I, II 
and III shown. 

a sink, so we may stop. Let us calculate the exact expressions for complexities of the stages of the 
learning graph for the distinctness problem. We assume all transitions have weight 1. We have 



\ 




{n-r + 2) 



r-2 




n-2\ (n-2 



r-2 \r-2 



(n - r + 1) 



r - 1 





(2) 
(3) 
(4) 



Hence, the total complexity, by Proposition [5l is 

0{r + y/7i + n/y/P), 

that attains its optimal value 0{n^^^) when r — r?!'^ . 



5 Using symmetry 

In the following, we study symmetric problems, i. e., problems that stay invariant under a wide 
group of permutations. For instance, the OR problem and the distinctness problem stay invariant 
under the action of the full symmetric group on the input variables. The triangle problem is 
invariant under permuting vertices of the graph. Denote the group that leaves the problem invariant 
by E. 

In fact, the symmetry of the problem is not the main thing that concerns us. What we really 
use is that it is possible to select a collection of sets of marked elements so that, firstly, they cover 
all possible inputs in and, secondly, group E acts transitively on this collection. Because of 

the transitivity, there is no real difference in the possible sets of marked elements. This property 
also can be fulfilled for non-symmetric functions. For example, if all 1-certificates of a function / 
have size at most fc, one can consider all fc-subsets as possible sets of marked elements. But this 
may result in a sub-optimal algorithm. 
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Stage 


I II III 


Speciality 
Length 


1 n n^/r 
r 1 1 



Table 2: Parameters (up to a constant factor) of the stages of the learning graph for the distinctness 
problem of Table [TJ 



5.1 Transitions 

We assume the learning graph stays invariant under the action of E. Call two transitions of the 
reduced learning graph equivalent, if one can be transformed into another by an element of S. I. 
e., if transition e is from S to T and e' is from S" to T', they are equivalent if there exists a € E 
such that a{S) = S' and <j{T) — T'. We assume equivalent transitions have equal weights. 

We define speciality T(e) of a transition e as the ratio of the size of the equivalence class 
containing e to the number of valid transitions in it. It equals the inverse of the probability of 
obtaining a valid transition when a random permutation from E is applied to e. 

Assume a flow Pe{x) has been fixed for each input x S We say the learning graph is 

symmetric on stage i if the following two conditions hold: 

• speciality of each equivalence class does not depend on the input x; 

• flows through all valid transitions in an equivalence class are equal, and its common value 
does not depend on the input, but only on the equivalence class. 

Similarly, one can define equivalence of vertices in set Vi and define speciality of a vertex and 
symmetry of the learning graph on set Vt. 

Define the average length Li of stage i by X]ee_E -Pe^(e). If the flow is symmetric, this quantity 
does not depend on the input. Let Ti denote the maximal speciality of a transition on stage i. 

Theorem 6. Assume the flow is symmetric on stage i. Then, the complexity of the stage is at 
most Li^/ri. 

Proof. Consider e ^ Ei and let p'{e) be the flow on a valid transition equivalent to e. By the 
above assumptions, p'{e) does not depend on the input. Hence, it is possible to define the weight 
of transition e equal to p'(e). Then the complexity of the stage is 




\ 



□ 



\eeEi 



\eeEi 



We leave as an exercise for the reader to check that the flow for the distinctness problem in 
Table [T] satisfles conditions of Theorem [5] and to check Table [21 where each entry is given up to a 
constant factor. Specialities are calculated in ([2]) — Thus, the total complexity of the learning 



graph for the distinctness problem is r 



n/y/r that is optimized when r 



7,2/3 



5.2 Subroutines 

One can use subroutines in learning graphs in the following manner. Suppose we have a vertex 
S C [n] in a learning graph G. One can treat 5" as the initial vertex for a problem with variable 
set [n] \ S. Let G' be a learning graph for this new problem. One can append G" after 5* in G. 

Suppose, for some input x G vertex S has in-flow S. Let AI be the set of marked 

elements. One can take a flow in G" that corresponds to the set of marked elements M\S. As in 
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an independent subroutine, the out- flow of vertex S* is 1. Let P be the complexity of the flow. If 
one multipUes the flow through each arc of G" by S, this makes S have the same in- and out-flows, 
and the contribution towards the complexity of flow on G from all arcs inside G" becomes S^P. 

We allow a subroutine stage in the reduced learning graph. It is the last stage, and it starts 
in vertex set Vk- For each vertex of this set, we apply the procedure described in the previous 
paragraph. For simplicity, we assume there is exactly one subroutine for each vertex. 

One can also consider a subroutine on a strict subset / C [n] \ S' of the set of remaining input 
variables. In this case, for each valid vertex of ts G Vk, subset I U S should contain all marked 
elements. 

One can apply symmetry for the subroutine stage, similarly as it is done for transitions. Let 
£{v) be the complexity of the subroutine appended to vertex v. Define the average complexity L 
of the subroutine stage as X^ueVk Pv^i'^)- Let T denote the maximal speciality of a vertex in Vk- 

Theorem 7. Suppose the flow is symmetric Jar vertex set Vk ■ Then, the complexity of the sub- 
routine stage is L^/T . 

Proof. The proof is similar to that of Theorem [S] Consider v G Vk and let p'{v) be the flow 
through a valid vertex equivalent to v. Assume the negative and the positive complexities of the 
subroutine after v both are £{v). Multiply the weights of all arc of the subroutine by p'{v). Then 
the complexity of the subroutine stage is 



.veVk ) V-ueVfc / 

6 Triangle Problem 

Let us start with a learning graph for the triangle problem corresponding to the algorithm from 
[5]. Denote the vertices of the triangle by a, h and c. Stages of the learning graph are described in 
Tabled 

I. Load a complete subgraph on r — 2 vertices that does not contain vertices a, b and c. 
II. Load all edges connecting a to the subgraph. 

III. Load all edges connecting b to the subgraph (including a). Thus result in a complete 
subgraph on r vertices with a and b inside, but c outside. 

IV. Load £ edges that connect c to € vertices of the subgraph other than a and b. 
V. Load edge ac. 

VI. Load edge be. 

Table 3: Stages for the triangle problem according to the algorithm of [S]. 

It is not hard to check that the obvious flow on this L- graph satisfies conditions of Theorem [51 
The parameters of the stages are in Table E) Hence, by Theorem |6l the complexity of the L-graph 
is 



Stage 


I 


II 


III 


IV 


V 


VI 


Speciality 


1 


n 


jr 


rc^ jr^ 


/r 




Length 


r2 


r 


r 


I 


1 


1 



Table 4: Parameters (up to a constant factor) of the stages of the learning graph of Table [3] 
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It is optimized when r — n^^^ and i — r?^^ . The optimum is Oip}^!^'^^. 

Let us analyze Table SI The key point is to minimize the speciality of stage VI using previous 
stages. Also, the complexities of stages II and V are majorized by the complexities of stages III 
and VI, respectively. Thus, we should concentrate on stages I, III, IV and VI. Their complexities 
are n^'^/^^ and n}^l^^ , respectively. It looks like resources of stage I are not used to 

the full extent. The problem is in stage III: it is too long. The main idea behind our construction 
is to reduce the length of stage III. 

Theorem 8. The triangle problem can be solved in 0(n'^^/^^) quantum queries with a bounded 
error. 

Note that 35/27 « 1.2963 < 1.3 of the previous algorithm. 

Proof. First, let us agree that for the vertices of the learning graph we will use term L- vertex 
in order to distinguish them from the vertices of the input graph. In the L-graph we will have 
transitions, whereas in the input graph we have edges. 

We use a symmetric approach based on stages as in Section [5l The symmetry group consists of 
all permutations of vertices. Unlike the learning graph for the distinctness problem, where all valid 
transitions on the same stage had the same flow through them, this time the flow will vary inside 
one stage. Thus, when defining stages, we will not only specify the relation of the transitions used 
in the fiow to the marked elements, but also the value of the flow going through them. 

To define the flow, we use the language of probability from Remark [2l Before we start the 
description of the stages, let us clarify our usage of term "random subgraph on k vertices" . By this 
we understand a subgraph of the complete graph on n vertices constructed in the following way. 
Take a subset U oi k vertices uniformly at random. Then, for each edge with both ends in U, add 
it to the subgraph with some prescribed probability s, independently at random. The description 
of the random subgraph contains both the vertex set and the selected edges. It is important to 
add the vertex set to the description, as it might happen some of the vertices inside U end up with 
degree 0. 

I. Load a random subgraph on r — 2 vertices that does not contain vertices a, b and c. 
II. Randomly load edges connecting a to the vertices of the subgraph. 

III. Randomly load edges connecting b to the subgraph (including a). The result is a 
random subgraph on r vertices with a and b inside the vertex set, but not c. 

IV. Select those L- vertices that do not contain edge ab and contain at least edges. 
V. Add edge ab. 

VI. Add vertex c. 

VII. Use a subroutine from Table [T] to load edges ac and be out of all edges connecting c 
to the vertices of the subgraph. 

Table 5: Stages for the triangle problem. 

The stages of the algorithm are described in Table [SJ We will describe each stage in more 
detail later, but let us say now that stage IV is different from the others. It is not a stage in the 
previous definition, it is a modifier for the flow before it. The reason behind its inclusion is that 
we want to apply Theorem IH] to all stages, although the actual flow does not satisfy the conditions 
of the theorem. We solve this problem by constructing an ideal flow that satisfies conditions of 
Theorem El and prove that the actual flow have at most constant times larger complexity than the 
ideal one. 

It is straightforward to check that conditions of Theorem [B] are satisfied for stages I to III. For 
example, the flow through a valid transition on stage III that originates in a subgraph having m 
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Stage 


I II III V VII 


Speciality 
Length/Complexity 


1 n n^/r rv^ / sr'^ 
sr^ sr sr 1 r^^'^ 



Table 6: Parameters (up to a constant factor) of the stages of the learning graph of Table [5] 



edges and loads k edges is 

^"-3^ s"+'^(l - s)(2)-'"-'=. 

The speciality of transitions on stages I, II and III are 0(1), 0{n) and 0{n'^ /r), respectively, 
independently on the equivalence class of the transition by the same argument as for the distinctness 
problem. The average length of a transition on stage I is O(sr^) by the standard probabilistic 
argument. Similarly, conditioned on leaving a fixed valid L- vertex, the average length of a transition 
on stages II and III is 0{sr). Hence, it coincides with the (unconditioned) average length of a 
transition on stages II and III. 

Now, we switch to stage IV. The set of L- vertices before and after the stage is the same and 
the stage uses no transitions. What it does, it modifies the fiow on stages I to III constructed so 
far. Consider L-vertices after stage III. Additionally to the condition to contain a and b and to 
not contain c, that is stated for state III, we require the subgraph to not contain edge ah and to 
have at least edges. 

Denote the probability of the L-vertex of the constructed random walk to satisfy the new 
constraints after stage III by p. By an easy probabilistic argument, under reasonable assumptions 
(s — o(l) and sr^ — w(l)), the probability is 1 — o(l). Assume the instance is large enough, so that 
p > 1/2. Then, we scale up the fiow 1/p times, and remove all fiow going to the bad L-vertices. 
After that the intensity of the fiow is 1 again, and the fiow on each transition has increased at 
most 2 times. Hence, this operation changes the complexity of stages Till at most by a factor of 
4, hence, we can ignore it in our calculations. 

Now, consider stage V. Before the filtering on stage IV, the flow through a valid L-vertex 
depended on the number of edges in it only. Hence, after the flltering, the flow also depends on 
the number of edges only. Moreover, we can remove L-vertices with less than edges in them, 

because they never have flow through them. Denote a fiow through a valid L-vertex with m edges 
by Pm- On stage V, we connect an L-vertex with all possible L-vertices where an edge connecting 
two vertices of the subgraph is added. 

Let us calculate the speciality of a transition on stage V. For each equivalence class, the prob- 
ability a random permutation of vertices identifies the edge being added with ab is exactly „(„^_2) ■ 
Moreover, provided that this happens, the probability that c is not used in the vertex set of the 
subgraph is (n — r)/(n — 2). Hence, under assumption r < n/2, the speciality is O(n^). The length 
of the stage is, clearly, 1. 

Stage VI does not add any edges, hence its complexity is 0. It adds a special vertex to the 
description of a L-vertex. The special vertex lies outside the vertex set of the subgraph. In this 
way, it increases the speciality of a L-vertex. Let us calculate it. If a random permutation is 
applied, the probability the special vertex gets mapped to c and both a and b are in the vertex set 
of the subgraph is 0{r'^ /n^). This should be multiplied by the probability edge ab is among the 
edges of the subgraph for a fixed choice of the vertex set containing both a and b. This probability 
is 0{m/r^). Hence, the speciality of the L-vertex is 0{n^/m) = 0{n? / sr^) because of the flltering 
on stage IV. Since, as proven in Section [Sj the complexity of the distinctness-type learning graph 
is 0(r^/^), the complexity of the last stage is Oin^^"^ s~^^'^r~^/^^) by Theorem[71 

Our estimates are summarized in TableEl Adding everything up, the complexity of the learning 
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graph is 

0{sr^ + sr^/n + sn^/r + n + n^/^s"^/^r"^/^). 

This is optimized when r = n^/'^ and s = n~^/^^. The optimum equals 0{n^^/'^'^). Note that 
complexities of stages I, III and VII are equal this time. □ 

7 Summary and Future Work 

An approach towards quantum algorithms for functions with small 1-certificate complexity using 
span programs has been proposed in the paper. It seems to be at least as powerful as the previous 
approach of quantum walk on the Johnson graph. 

The analysis of the algorithm contains no spectral analysis. It uses an optimization of a 
quadratic function over the set of flows. 

A nice property of the new approach is that is has build-in tools for amortization. If some 
computational paths in the program take less queries than the others, the complexity of the program 
is calculated using the average rather than the maximal cost, as it is in the previous approaches. 

Disadvantages of our method include that it only can be used for the query complexity, and 
the additional logm factor for non-Boolean functions. 

The future work could consist in designing new algorithms based on this framework. Another 
interesting problem is to remove the log m factor mentioned above. 
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